首页> 外文OA文献 >Classifying Arabic text using KNN classifier.
【2h】

Classifying Arabic text using KNN classifier.

机译:使用KNN分类器对阿拉伯文字进行分类。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

With the tremendous amount of electronic documents available, there is a great need to classify documents automatically. Classification is the task of assigning objects (images, text documents, etc.) to one of several predefined categories. The selection of important terms is vital to classifier performance, feature set reduction techniques such as stop word removal, stemming and term threshold were used in this paper. Three term-selection techniques are used on a corpus of 1000 documents that fall in five categories. A comparison study is performed to find the effect of using full-word, stem, and the root term indexing methods. K-nearest – neighbors classifiers used in this study. The averages of all folds for Recall, Precision, Fallout, and Error-Rate were calculated. The results of the experiments carried out on the dataset show the importance of using k-fold testing since it presents the variations of averages of recall, precision, fallout, and error rate for each category over the 10-fold.
机译:由于可用的电子文档数量巨大,因此非常需要自动对文档进行分类。分类是将对象(图像,文本文档等)分配给几个预定义类别之一的任务。重要术语的选择对于分类器的性能至关重要,本文使用了诸如减少停用词,词干和术语阈值之类的特征集简化技术。在属于五类的1000个文档的语料库上使用了三种术语选择技术。进行了比较研究,以发现使用全字词,词干和词根检索方法的效果。 K近邻–本研究中使用的邻居分类器。计算召回率,精确度,辐射和错误率的所有折叠的平均值。在数据集上进行的实验结果显示了使用k倍测试的重要性,因为它显示了每个类别的召回率,准确性,掉落率和错误率的平均值在10倍以上的变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号